Search CORE

355 research outputs found

A Computational Framework for Host-Pathogen Protein-Protein Interactions

Author: Chen Huaming
Publication venue: School of School of Computing and Information Technology
Publication date: 01/01/2020
Field of study

Infectious diseases cause millions of illnesses and deaths every year, and raise great health concerns world widely. How to monitor and cure the infectious diseases has become a prevalent and intractable problem. Since the host-pathogen interactions are considered as the key infection processes at the molecular level for infectious diseases, there have been a large amount of researches focusing on the host-pathogen interactions towards the understanding of infection mechanisms and the development of novel therapeutic solutions. For years, the continuously development of technologies in biology has benefitted the wet lab-based experiments, such as small-scale biochemical, biophysical and genetic experiments and large-scale methods (for example yeast-two-hybrid analysis and cryogenic electron microscopy approach). As a result of past decades of efforts, there has been an exploded accumulation of biological data, which includes multi omics data, for example, the genomics data and proteomics data. Thus, an initiative review of omics data has been conducted in Chapter 2, which has exclusively demonstrated the recent update of ‘omics’ study, particularly focusing on proteomics and genomics. With the high-throughput technologies, the increasing amount of ‘omics’ data, including genomics and proteomics, has even further boosted. An upsurge of interest for data analytics in bioinformatics comes as no surprise to the researchers from a variety of disciplines. Specifically, the astonishing rate at which genomics and proteomics data are generated leads the researchers into the realm of ‘Big Data’ research. Chapter 2 is thus developed to providing an update of the omics background and the state-of-the-art developments in the omics area, with a focus on genomics data, from the perspective of big data analytics..

Research Online

APEX2S: A Two-Layer Machine Learning Model for Discovery of host-pathogen protein-protein Interactions on Cloud-based Multiomics Data

Author: Chen Huaming
Chi Chi-Hung
Shen Jun
Wang Lei
Publication venue: 'Sociological Research Online'
Publication date: 01/01/2020
Field of study

Presented by the avalanche of biological interactions data, computational biology is now facing greater challenges on big data analysis and solicits more studies to mine and integrate cloud-based multiomics data, especially when the data are related to infectious diseases. Meanwhile, machine learning techniques have recently succeeded in different computational biology tasks. In this article, we have calibrated the focus for host-pathogen protein-protein interactions study, aiming to apply the machine learning techniques for learning the interactions data and making predictions. A comprehensive and practical workflow to harness different cloud-based multiomics data is discussed. In particular, a novel two-layer machine learning model, namely APEX2S, is proposed for discovery of the protein-protein interactions data. The results show that our model can better learn and predict from the accumulated host-pathogen protein-protein interactions

Crossref

OPUS - University of Technology Sydney

Research Online

Honest Score Client Selection Scheme: Preventing Federated Learning Label Flipping Attacks in Non-IID Scenarios

Author: Bao Wei
Chen Huaming
Li Yanli
Xu Zhengmeng
Yuan Dong
Publication venue
Publication date: 09/11/2023
Field of study

Federated Learning (FL) is a promising technology that enables multiple actors to build a joint model without sharing their raw data. The distributed nature makes FL vulnerable to various poisoning attacks, including model poisoning attacks and data poisoning attacks. Today, many byzantine-resilient FL methods have been introduced to mitigate the model poisoning attack, while the effectiveness when defending against data poisoning attacks still remains unclear. In this paper, we focus on the most representative data poisoning attack - "label flipping attack" and monitor its effectiveness when attacking the existing FL methods. The results show that the existing FL methods perform similarly in Independent and identically distributed (IID) settings but fail to maintain the model robustness in Non-IID settings. To mitigate the weaknesses of existing FL methods in Non-IID scenarios, we introduce the Honest Score Client Selection (HSCS) scheme and the corresponding HSCSFL framework. In the HSCSFL, The server collects a clean dataset for evaluation. Under each iteration, the server collects the gradients from clients and then perform HSCS to select aggregation candidates. The server first evaluates the performance of each class of the global model and generates the corresponding risk vector to indicate which class could be potentially attacked. Similarly, the server evaluates the client's model and records the performance of each class as the accuracy vector. The dot product of each client's accuracy vector and global risk vector is generated as the client's host score; only the top p\% host score clients are included in the following aggregation. Finally, server aggregates the gradients and uses the outcome to update the global model. The comprehensive experimental results show our HSCSFL effectively enhances the FL robustness and defends against the "label flipping attack.

arXiv.org e-Print Archive

Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language Models

Author: Chen Huaming
Huang Yuheng
Ma Lei
Song Jiayang
Wang Zhijie
Publication venue
Publication date: 16/07/2023
Field of study

The recent performance leap of Large Language Models (LLMs) opens up new opportunities across numerous industrial applications and domains. However, erroneous generations, such as false predictions, misinformation, and hallucination made by LLMs, have also raised severe concerns for the trustworthiness of LLMs', especially in safety-, security- and reliability-sensitive scenarios, potentially hindering real-world adoptions. While uncertainty estimation has shown its potential for interpreting the prediction risks made by general machine learning (ML) models, little is known about whether and to what extent it can help explore an LLM's capabilities and counteract its undesired behavior. To bridge the gap, in this paper, we initiate an exploratory study on the risk assessment of LLMs from the lens of uncertainty. In particular, we experiment with twelve uncertainty estimation methods and four LLMs on four prominent natural language processing (NLP) tasks to investigate to what extent uncertainty estimation techniques could help characterize the prediction risks of LLMs. Our findings validate the effectiveness of uncertainty estimation for revealing LLMs' uncertain/non-factual predictions. In addition to general NLP tasks, we extensively conduct experiments with four LLMs for code generation on two datasets. We find that uncertainty estimation can potentially uncover buggy programs generated by LLMs. Insights from our study shed light on future design and development for reliable LLMs, facilitating further research toward enhancing the trustworthiness of LLMs.Comment: 20 pages, 4 figure

arXiv.org e-Print Archive

Taming Gradient Variance in Federated Learning with Networked Control Variates

Author: Chen Xingyan
Du Huaming
Liu Yaling
Wang Mu
Zhao Yu
Publication venue
Publication date: 26/10/2023
Field of study

Federated learning, a decentralized approach to machine learning, faces significant challenges such as extensive communication overheads, slow convergence, and unstable improvements. These challenges primarily stem from the gradient variance due to heterogeneous client data distributions. To address this, we introduce a novel Networked Control Variates (FedNCV) framework for Federated Learning. We adopt the REINFORCE Leave-One-Out (RLOO) as a fundamental control variate unit in the FedNCV framework, implemented at both client and server levels. At the client level, the RLOO control variate is employed to optimize local gradient updates, mitigating the variance introduced by data samples. Once relayed to the server, the RLOO-based estimator further provides an unbiased and low-variance aggregated gradient, leading to robust global updates. This dual-side application is formalized as a linear combination of composite control variates. We provide a mathematical expression capturing this integration of double control variates within FedNCV and present three theoretical results with corresponding proofs. This unique dual structure equips FedNCV to address data heterogeneity and scalability issues, thus potentially paving the way for large-scale applications. Moreover, we tested FedNCV on six diverse datasets under a Dirichlet distribution with {\alpha} = 0.1, and benchmarked its performance against six SOTA methods, demonstrating its superiority.Comment: 14 page

arXiv.org e-Print Archive

Real-time Management of groundwater resource based on wireless sensor networks

Author: Zhou Qingguo
Chen Chong
Zhang Gaofeng
Chen Huaming
Chen Dan
Yan Yingnan
Shen Jun
Zhou Rui
Publication venue: 'Sociological Research Online'
Publication date: 01/08/2015
Field of study

Groundwater plays a vital role in the arid inland river basins, in which the groundwater management is critical to the sustainable development of area economy and ecology. Traditional sustainable management approaches are to analyze different scenarios subject to assumptions or to construct simulation–optimization models to obtain optimal strategy. However, groundwater system is time-varying due to exogenous inputs. In this sense, the groundwater management based on static data is relatively outdated. As part of the Heihe River Basin (HRB), which is a typical arid river basin in Northwestern China, the Daman irrigation district was selected as the study area in this paper. First, a simulation–optimization model was constructed to optimize the pumping rates of the study area according to the groundwater level constraints. Three different groundwater level constraints were assigned to explore sustainable strategies for groundwater resources. The results indicated that the simulation–optimization model was capable of identifying the optimal pumping yields and satisfy the given constraints. Second, the simulation–optimization model was integrated with wireless sensors network (WSN) technology to provide real-time features for the management. The results showed time-varying feature for the groundwater management, which was capable of updating observations, constraints, and decision variables in real time. Furthermore, a web-based platform was developed to facilitate the decision-making process. This study combined simulation and optimization model with WSN techniques and meanwhile attempted to real-time monitor and manage the scarce groundwater resource, which could be used to support the decision-making related to sustainable management

Directory of Open Access Journals

Research Online

OpenEdition